Goto

Collaborating Authors

 Cairo Governorate






Language Model Tokenizers Introduce Unfairness Between Languages

Neural Information Processing Systems

Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.



4 times drinking coffee was illegal--or even punishable by death

Popular Science

Rulers once closed cafés, burned beans, and even executed someone--all for a cup of coffee. A photograph taken in the 1920s shows a group of men gather at a small roadside coffee stall in Cairo, Egypt. Breakthroughs, discoveries, and DIY tips sent six days a week. Bach wrote a cantata about it . Scholars, philosophers, and lawyers have argued over it.


Humanoid robot makes architectural history by designing a building

FOX News

The world's first ultra-realistic robot artist Ai-Da Robot can create architectural designs using camera eyes and AI algorithms. Her Space Pod concept envisions modular housing.


A variational Bayes latent class approach for EHR-based patient phenotyping in R

Buckley, Brian, O'Hagan, Adrian, Galligan, Marie

arXiv.org Machine Learning

As regulatory agencies increasingly recognise real-world evidence as a complement to traditional clinical trial data, interest has grown in applying Bayesian methods across both interventional and observational research (Boulanger and Carlin (2021). A central objective in many clinical investigations is the delineation of patient subgroups that exhibit comparable disease-related characteristics (He, Belouali, Patricoski, Lehmann, Ball, Anagnostou, Kreimeyer, and Botsis (2023)). Electronic Health Records (EHR) have become an important resource for such phenotypic analyses (Hripcsak and Albers (2013)). Bayesian approaches to patient phenotyping in clinical observational studies have been limited by the computational challenges associated with applying the Markov Chain Monte Carlo (MCMC) approach to real-world data. Hubbard, Huang, Harton, Oganisian, Choi, Utidjian, Eneli, Bailey, and Chen (2019) proposed a Bayes latent class model that could be used in a general context for observational studies that use EHR data. They consider the common clinical context where gold-standard phenotype information, such as genetic and laboratory data, is not fully available. A general model of this form has high potential applicability for use in clinical decision support across disease areas for both primary and secondary clinical databases. Latent Class Analysis (LCA) is widely used when we want to identify patient phenotypes or subgroups given multivariate data (Lanza and Rhoades (2013)). A challenge in clinical LCA is the prevalence of mixed data, where we may have combinations of continuous, nominal, ordinal and count data.


Roman generals gifted kittens and piglets to their pet monkeys

Popular Science

The macaques were status symbols all the way from India. Breakthroughs, discoveries, and DIY tips sent every weekday. Elites in Ancient Rome went to great lengths to advertise their status and wealth. Based on recent archaeological excavations in Egypt, at least some high-ranking military officials even showed off with their choice of pets. In the, researchers at Poland's University of Warsaw described a nearly 2,000-year-old animal cemetery in the Egyptian port city of Berenike that includes the remains of multiple macaque monkeys .

  Country:
  Industry: Government > Military (0.51)